Skip to content

[CORE-15194] schema_registry: add subject query param to GET /schemas/ids/{id} #29451

Open
nguyen-andrew wants to merge 8 commits intoredpanda-data:devfrom
nguyen-andrew:sr/subject-query-param
Open

[CORE-15194] schema_registry: add subject query param to GET /schemas/ids/{id} #29451
nguyen-andrew wants to merge 8 commits intoredpanda-data:devfrom
nguyen-andrew:sr/subject-query-param

Conversation

@nguyen-andrew
Copy link
Member

@nguyen-andrew nguyen-andrew commented Jan 28, 2026

Add an optional subject query parameter to the GET /schemas/ids/{id} endpoint. This allows specifying the context for schema lookup by extracting the context from the provided subject name (e.g., :.myctx:mysubject).

Fixes CORE-15194

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.3.x
  • v25.2.x
  • v25.1.x

Release Notes

  • none

@nguyen-andrew nguyen-andrew self-assigned this Jan 28, 2026
Copilot AI review requested due to automatic review settings January 28, 2026 22:53
@nguyen-andrew nguyen-andrew requested a review from a team as a code owner January 28, 2026 22:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an optional subject query parameter to the GET /schemas/ids/{id} endpoint to enable context-aware schema lookups. When provided, the subject name (which can include context in the format :.context:subject) is used to determine the context for retrieving the schema, rather than always using the default context.

Changes:

  • Modified the endpoint handler to parse and use the optional subject parameter to extract context information
  • Updated the Python test client to support passing the subject parameter
  • Added comprehensive test coverage verifying the new parameter works correctly with default and non-default contexts
  • Updated API documentation to describe the new query parameter

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
src/v/pandaproxy/schema_registry/handlers.cc Added logic to parse the optional subject query parameter, extract context from it, and use the context when looking up schemas by ID
tests/rptest/tests/schema_registry_test.py Updated test client method to accept subject parameter and added comprehensive test case covering various scenarios
src/v/pandaproxy/api/api-doc/schema_registry.json Added documentation for the new subject query parameter

@nguyen-andrew nguyen-andrew force-pushed the sr/subject-query-param branch from 6118d1f to fa5356c Compare January 28, 2026 23:22
@nguyen-andrew
Copy link
Member Author

Force push to rebase on latest dev.

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Jan 29, 2026

CI test results

test results on build#79806
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
ScalingUpTest test_fast_node_addition null integration https://buildkite.com/redpanda/redpanda/builds/79806#019c06fb-10a4-47cf-a3ce-73980f76c3be FLAKY 19/21 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0206, p0=0.3402, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3917, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ScalingUpTest&test_method=test_fast_node_addition
test results on build#80163
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
QuotaManagementUpgradeTest test_upgrade null integration https://buildkite.com/redpanda/redpanda/builds/80163#019c2b13-f7e9-4341-9fc9-7bf4ff701d1c FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0656, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1841, p1=0.1307, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=QuotaManagementUpgradeTest&test_method=test_upgrade
WriteCachingFailureInjectionE2ETest test_crash_all {"use_transactions": false} integration https://buildkite.com/redpanda/redpanda/builds/80163#019c2b16-987c-4725-87eb-b1eb8a5d16bc FLAKY 9/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.1108, p0=0.6911, reject_threshold=0.0100. adj_baseline=0.2970, p1=0.1540, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all
test results on build#80198
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
RedpandaNodeOperationsSmokeTest test_node_ops_smoke_test {"cloud_storage_type": 1, "mixed_versions": false} integration https://buildkite.com/redpanda/redpanda/builds/80198#019c2d53-20b1-4478-a0c8-576a77b8ac09 FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0068, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=RedpandaNodeOperationsSmokeTest&test_method=test_node_ops_smoke_test
RedpandaNodeOperationsSmokeTest test_node_ops_smoke_test {"cloud_storage_type": 1, "mixed_versions": true} integration https://buildkite.com/redpanda/redpanda/builds/80198#019c2d53-20b3-4950-ad61-8aaab665fb06 FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0057, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=RedpandaNodeOperationsSmokeTest&test_method=test_node_ops_smoke_test

auto subject_param = parse::query_param<std::optional<ss::sstring>>(
*rq.req, "subject");

// Extract context from subject, or use default context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the behaviour is a bit trickier here unfortunately:

# Register the same schema in two contexts
% curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": '"$(cat ~/tasks/avro-refs/address.avsc | jq -Rs .)"', "schemaType": "AVRO"}' http://localhost:8081/subjects/:.prod:Ad
dress/versions
{"id":1,"version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}"}%

% curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": '"$(cat ~/tasks/avro-refs/address.avsc | jq -Rs .)"', "schemaType": "AVRO"}' http://localhost:8081/subjects/:.shared:Address/versions
{"id":1,"version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}"}%


# Query the schemas

% curl "http://localhost:8081/schemas/ids/1?subject=:.shared:"
{"subject":":.shared:Address","version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}","ts":1769688260232,"deleted":false}%                                                                                                                                                                                                                                    

% curl "http://localhost:8081/schemas/ids/1?subject=:.prod:"
{"subject":":.prod:Address","version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}","ts":1769688254717,"deleted":false}%                                                                                                                                                                                                                                    

% curl "http://localhost:8081/schemas/ids/1?subject=Address"
{"subject":":.prod:Address","version":1,"guid":"a3d4c656-76ec-775d-35a1-1de29d031a17","schemaType":"AVRO","schema":"{\"type\":\"record\",\"name\":\"Address\",\"fields\":[{\"name\":\"street\",\"type\":\"string\"},{\"name\":\"city\",\"type\":\"string\"}]}","ts":1769688254717,"deleted":false}%                                                                                                                                                                                                                                    

% curl "http://localhost:8081/schemas/ids/1?subject=:.:"
{"error_code":40403,"message":"Schema 1 not found"}%

% curl "http://localhost:8081/schemas/ids/1?subject=:.prod:NotAddress"
{"error_code":40403,"message":"Schema 1 not found"}%

% curl "http://localhost:8081/schemas/ids/1?subject=:.shared:NotAddress"
{"error_code":40403,"message":"Schema 1 not found"}%

I think we might need to treat the subject parameter differently depending on whether it contains only a context (empty subject) or a real subject.

To be honest, we might be able to get away with partial support of the parameter, by throwing if it contains a real subject, and not just a context. But if we can implement it fully, that would be great.

@nguyen-andrew nguyen-andrew force-pushed the sr/subject-query-param branch 4 times, most recently from 3a8f4d2 to b6d4e37 Compare February 2, 2026 04:27
@nguyen-andrew
Copy link
Member Author

@nguyen-andrew nguyen-andrew force-pushed the sr/subject-query-param branch from b6d4e37 to 26fe7f5 Compare February 2, 2026 20:31
@nguyen-andrew
Copy link
Member Author

Force push to address PR comments.

Copy link
Contributor

@pgellert pgellert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the AuthZ logic is not quite correct yet. I think the simplest behaviour we could implement here that is consistent with the earlier "allow the lookup if we have access through any subject" behaviour is that resolve_schema_across_contexts could look up both the schema definition as well as the list of subjects that would provide access to the schema, and then call the AuthZ handler only once from the handler for the full list of subjects that would provide access.

@nguyen-andrew nguyen-andrew marked this pull request as draft February 4, 2026 18:09
@nguyen-andrew nguyen-andrew force-pushed the sr/subject-query-param branch 3 times, most recently from 264bba4 to 31aaab3 Compare February 4, 2026 18:44
@nguyen-andrew
Copy link
Member Author

nguyen-andrew commented Feb 4, 2026

@nguyen-andrew nguyen-andrew marked this pull request as ready for review February 4, 2026 19:01
Update context_subject::from_string to handle context-only strings.
Rename is_default_context() to is_default_context_only() to better
reflect its behavior: it returns true only when the context is the
default context AND the subject is empty (context-only).

Add a new method is_non_default_context() that checks if a subject is
in a non-default context. This will be used in future changes to handle
subject query parameters.
This enables retrieving subjects filtered by a specific context.
@nguyen-andrew nguyen-andrew force-pushed the sr/subject-query-param branch from 31aaab3 to 07557bc Compare February 4, 2026 23:31
@nguyen-andrew
Copy link
Member Author

Force push to fix broken unit test and improve new ducktape tests.

Previously, the GET /schemas/ids/{id} endpoint resolved schemas using
only the numeric ID without any context or subject scoping. This meant
clients could not target a specific context when retrieving a schema.

This change adds support for an optional `subject` query parameter,
parsed via context_subject::from_string(), that controls how the
schema ID is resolved:

- Default context with subject (e.g., "subj" or ":.:subj"): perform
  an extended search: default context with subject first, then other
  contexts with subject, then default context without the subject
  restriction
- All other cases: resolve within a single context only

The schema definition is now retrieved using the resolved context,
ensuring the correct schema is returned in multi-context deployments.
Add test for the `subject` query parameter on GET /schemas/ids/{id},
verifying that it correctly extracts context for schema lookup.
Also update the test client to support the new parameter.
Extract shared ACL test infrastructure (setup, helpers) into a new
SchemaRegistryAclAuthzTestBase class to enable reuse by other tests.
Add SchemaRegistryContextAuthzTest to verify ACL enforcement when using
context-qualified subjects and the subject query parameter with
GET /schemas/ids/{id}. The tests cover authorization scenarios including
literal and prefix ACLs on context-qualified subjects, cross-context
search behavior, and proper 403 responses to prevent information leakage.
@nguyen-andrew nguyen-andrew force-pushed the sr/subject-query-param branch from 07557bc to 0596bc4 Compare February 5, 2026 10:00
@nguyen-andrew
Copy link
Member Author

Force push to fix bug.

Copy link
Contributor

@pgellert pgellert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments, mainly minor points + nits, but the core behaviour is looking great. Good job on figuring out this one!

Comment on lines +164 to +170
if (ctx_sub.ctx == default_context && !ctx_sub.sub().empty()) {
vlog(
srlog.error,
"resolve_schema_id_simple cannot be called with default context "
"and non-empty subject");
throw exception(error_code::internal_server_error);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd probably use a vassert here to assert these kinds of pre-conditions that we expect to hold

Comment on lines +269 to +271
for (const auto& ctx : contexts | std::views::filter([](const auto& c) {
return c != default_context;
})) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move this c != default_context filtering into the loop as an if (...) { continue; }. I'm wondering if it is safe to iterate over an r-value range view while co_await'ing in the loop. Maybe it is, I don't know off the top of my hat and I'd have to think a bit harder about this to confirm.

ctx,
subject());
enterprise::handle_get_schemas_ids_id_authz(
rq, auth_result, {ctx_sub});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could move the ctx_sub here, just like on L262.

srlog.error,
"resolve_schema_id_extended should only be called with non-empty "
"subject");
throw exception(error_code::internal_server_error);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same point about vassert here

Comment on lines +9368 to +9371
@cluster(num_nodes=1)
def test_subject_param_with_authorized_subject(self):
"""
GET /schemas/ids/{id}?subject=sub1 succeeds when user has READ on sub1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a subset of the next test (test_subject_param_unauthorized_despite_other_subject_access), so let's merge them to simplify.

#include <algorithm>
#include <iterator>
#include <limits>
#include <optional>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent commit message on this one!

return is_context_only() && ctx == default_context;
}

bool is_non_default_context() const { return ctx != default_context; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is is_non_default_context() used after all (I can't see any usage)? Let's drop that commit if we don't need it.

id,
ctx_sub.ctx,
ctx_sub.sub().empty() ? ""
: ss::sstring{", subject '"} + ctx_sub.sub() + "'");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/fyi: you could use ss::format(", subject '{}'", ctx_sub.sub()) here

result = self.sr_client.post_subjects_subject_versions(
subject="sub1", data=json.dumps({"schema": schema1_def})
)
assert result.status_code == requests.codes.ok
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd swap these asserts for self.assert_equal(...). The issue with these raw asserts is that when they fail, we don't see what the value of result.status_code was, which makes it more difficult to debug flaky tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants